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CN ■ Abstract 

We address the problem of density estimation with L p -loss by selection of kernel 
, estimators. We develop a selection procedure and derive corresponiding L p -risk oracle 

inequalities. It is shown that the proposed selection rule leads to the minimax estimator 
that is adaptive over a scale of the anisotropic Nikol'ski classes. The main technical 
tools used in our derivations are uniform bounds on the L„-norms of empirical processes 
E"H , developed recently in Goldenshluger and Lepski (2010). 
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1 Introduction 

Let X be a random variable in R d having density / with respect to the Lebesgue measure. 
Q\ ■ We want to estimate / on the basis of the i.i.d. sample X n = {X\, . . . , X n ) drawn from /. 

By an estimator / we mean any measurable real function f(t) = f(X n ;t), t E R rf . Accuracy 
of an estimator / is measured by the L s -risk: 

^; K s [f,f]:=\E f \\f-f\\i\ 1/9 , se[l,oo), q>l, 

where Ef is the expectation with respect to the probability measure P/ of the observations 
X n . The objective is to develop an estimator of / with small L s -risk. 

Kernel density estimates originate in Rosenblatt (1956) and Parzen (1962); this is one 
of the most popular techniques for estimating densities [Silverman (1986), Devroye and 
Gyorfi (1985)]. Let K : M. d — > R be a fixed function such that / K(x)dx = 1 (we call such 
functions kernels). Given a bandwidth vector h = (h±, . . . , hd), hi > 0, the kernel estimator 
fh of / is defined by 



l JU (t-x,\ 1.A , 

i=\ \ / i=i 
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where Vh := Y\i=i h%i u / v f° r u i v ^ stands for the coordinate-wise division, and 
K h (-) := V h ~ l K(-/h). It is well-known that accuracy properties of fh are determined 
by the choice of the bandwidth h, and bandwidth selection is the central problem in kernel 
density estimation. There are different approaches to the problem of bandwidth selection. 

The minimax approach is based on the assumption that / belongs to a given class of 
densities F, and accuracy of fh is measured by its maximal L s -risk over the class F, 

K 8 [f h ;W] :=supft s [A;/]. 

few 

Typically F is a class of smooth functions, e.g., the Holder functional class. Then the 
bandwidth h is selected so that the maximal risk lZ s [fh',V] (or a reasonable upper bound 
on it) is minimized with respect to h. Such a choice leads to a deterministic bandwidth h 
depending on the sample size n, and on the underlying functional class F. In many cases the 
resulting kernel estimator constructed in this way is rate optimal (or optimal in order) over 
the class F. The minimax kernel density estimation with L s -risks on M. d was considered 
in Brctagnolle and Huber (1979), Ibragimov and Khasminskii (1980, 1981), Devroye and 
Gyorfi (1985), Hasminskii and Ibragimov (1990), Donoho et al. (1996), Kerkyacharian, 
Picard and Tribouley (1996), Juditsky and Lambert-Lacroix (2004), and Mason (2009) 
where further references can be found. 

The oracle approach considers a set of kernel estimators F(J-L) = {fh,h £ "H}, and aims 
at a measurable data-driven choice h £ H such that for every / from a large functional 
class the following "L s -risk oracle inequality holds 

K s [f\;f}<Cmin s [f h ;f} + S n . (2) 

he ri 

Here C is a constant independent of / and n, and the remainder 5 n does not depend on 
/. Oracle inequalities with "small" remainder term 5 n and constant C close to one are of 
prime interest; they are key tools for establishing minimax and adaptive minimax results 
in estimation problems. To the best of our knowledge, oracle inequalities of the type ([2]) 
were established only in the cases s = 1 and s = 2. Devroye and Lugosi (1996, 1997, 2001) 
established oracle inequalities for s = 1. The case s = 2 was studied by Massart (2007, 
Chapter 7), Samarov and Tsybakov (2007), Rigollet and Tsybakov (2007) and Birge (2008). 
The last cited paper contains a detailed discussion of recent developments in this area. 

The contribution of this paper is two-fold. First, we propose a selection procedure 
for a set of kernel estimators, and establish the corresponding L s -risk, s £ [l,oo), oracle 
inequalies of the type ([2]) . Second, we demonstrate that our selection rule leads to a minimax 
adaptive estimator over a scale of the anisotropic Nikolski's classes (see Section [3] below for 
the class definition). 

More specifically, let h min = (hf in , . . . , /ijf ax ) and /i max = (hf ax , . . . , /i^ ax ) be two fixed 
vectors satisfying < /if 1111 < hf 11 ** < 1, Vi, and let 

d 

H-.= (g)[hr\hr x }. (3) 
i=i 

Consider the set of kernel estimators 

HK) = {LheH}, (4) 
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where fh is given in (pQ). We propose a measurable choice h G T~L such that the resulting 
estimator f = f% satisfies the following oracle inequality 

K s [f h ;f] < inf |(l + 3||lir||i)ft s [/ h ;/] + C s (nV h )-^\ + S n , s . (5) 

The constants C s , j s , and the remainder term 5 njS admit different expressions depending 
on the value of s. 

• If s G [1,2) then ([5]) holds for all densities / with j s = 1 — -, C s depending on the 
kernel K only, and with 

5 n!S = ci(lnn) C2 n 1/s exp { - c^n 2 ' 8 - 1 } 

for some constants q, i = 1, 2, 3. 

• If s G [2, oo) then ([5]) holds for all densities / uniformly bounded by a constant foo 
with 7 S = i, C s depending on K and f^ only, and with 

S n ,s = ci (In nf 2 n 1/2 exp { - c 3 V~ 2 l s } , V ma , x '■= V hw , 

for some constants q, i = 1,2,3. We emphasize that the proposed selection rule is 
fully data-driven and does not use information on the value of foo. 

Thus the oracle inequality ([5]) holds with negligibly small (in terms of dependence on 
n) remander 5 n)S (by choice of V max in the case s G [2,oo)). We stress that explicit non- 
asymptotic expressions for C s , c\, C2 and C3 are available. It is important to realize that the 
term C s {nVh)~ la is a tight upper bound on the stochastic error of the kernel estimator fh. 
This fact allows to derive rate optimal estimators that adapt to unknown smoothness of the 
density /. In particular, in Section [3] we apply our oracle inequalities in order to develop 
a rate optimal adaptive kernel estimator for the anisotropic Nikol'ski classes. Minimax 
estimation of densities from such classes was studied in Ibragimov and Khasminskii (1981), 
while the problem of adaptive estimation was not considered in the literature. 

The paper is structured as follows. In Section [2] we define our selection rule and prove 
key oracle inequalities. Section [3] discusses adaptive rate optimal estimation of densitites 
for a scale of anisotropic Nikol'skii classes. Proofs of all results are given in Section [5J 

2 Selection rule and oracle inequalities 

Let J~iT-L) be the set of kernel density estimators defined in We want to select an 
estimator from the family Til-L). For this purpose we need to impose some assumptions 
and establish notation that will be used in definition of our selection procedure. 

2.1 Assumptions 

The following assumptions on the kernel K will be used throughout the paper. 

(Kl) The kernel K satisfies the Lipschitz condition 

\K(x)-K(y)\<L K \x-y\, Vx,y £R d , 

where | • | denotes the Euclidean distance. Moreover, K is compactly supported, and, 
without loss of generality, supp(A') C [—1/2, \/2] d . 
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(K2) There exists a real number koo < oo such that ||i^||oo < ^-oo- 

Assumptions (Kl) and (K2) are rather standard in kernel density estimation. We note 
that Assumption (Kl) can be weaken in several ways. For example, it suffices to assume 
that K belongs to the isotropic Holder ball of functions W^a, Lk) with any a > (in 
Assumption (Kl) a = 1). 

Sometimes we will suppose that / £ F, where 

F := {p : R d -> E : p > 0, J p = 1, {{p^ < f^ < oo}, 
and foo is a fixed constant. Without loss of generality we assume that foo > 1- 
2.2 Notation 

For any U : M. d — > E and s E [1, oo) define 

f An^WUlU, se [1,2), 
Ps(U):=< , 

i n~ 1/2 ||^|| 2 , a = 2, 

and if s £ (2, oo) then we set 

-i s/2 \ l/s 



c s < n 



-1/2 



y j u 2 (t-x)f(x)dx 



dt) +2n 1 l s ~ 1 \\U\ 



where c s := 15s/ Ins is the best known constant in the Rosenthal inequality (Johnson, 
Schechtman and Zinn 1985). Observe that p s {U) depends on / when s 6 (2,oo); hence we 
will also consider the empirical counterpart of p s (U): 

/2 v l/s 

dt) +2n 1 /*- 1 ||l7||A 



p s (U) := Cs {n- 1/2 (/[^|:t/ 2 (t-X 
We put also 

r s (U) := p s (U) V n'^WUy, f s (U) := Ps (U) V n'^WUy, 



and 



f 32p s (U), s e [1,2), 



ff>2(U), 8 = 2, 



k 32f,(l7), s > 2. 
Armed with this notation we are ready to describe our selection rule. 

2.3 Selection rule 

The rule is based on auxiliary estimators {fh,njh,r] E %} that are defined as follows: for 
every pair h,rj £ Ti we let 



1 ™ 

/m(«) :=-Y\[K h *K v ](t-X i 



i=l 
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where * stands for the convolution on M. d . Define also 

m s (h,rj) := g s {K h ) + g s (K h * K v ), \/h,r] £ H, 

(({\ 

m*(h) := sup m s (tj, h), V/i £ %. 

For every h £ H let 

i4 := sup [|| //^ - /^lls - m s (h,r/)l ,+m*(h). (7) 

Then the selected bandwidth h and the corresponding kernel density estimator are defined 
by 

h := arg inf R h , f = f h . (8) 

The selection rule (H])-® is a refinement of the one introduced recently in Goldenshluger 
and Lepski (2008, 2009) for the Gaussian white noise model. 

Remarks. 

1. It is easy to check that Assumption (Kl) implies that Rh and m*{h) are continuous 
random functions on the compact subset 7i C M. d . Thus, h exists and measurable (Jennrich 
1969). 

2. We call function m s (-,-) the majorant. In fact, if £/j and £/j ir) denote the stochastic 
errors of estimators fh and fh jV respectively, i.e., if 

1 n 

:= -^[Khit-Xj-EfKhit-X)], 
i=i 

: = -^{[^*^](t-^)-E/[^*^](i-^)}, 

i=l 

then it is seen from the proof of Theorems Q] and [2] below that m s (h,r]) uniformly "majo- 
rates" Uh^-^Ws in the sense that the expectation E/ sup^ h v ^ eHxn [Uh,v~Cri\\s-m s (h, r])] q + 
is "small." 

3. It is important to realize that majorant m s (h, if) does not depend on the density / to 
be estimated. The majorant is completely determined by kernel K and observations, and 
thus it is available to the statistician. 

2.4 Oracle inequalities 

Now we are in a position to establish oracle inequalities on the risk of the estimator f = ft 
given by (0H([E|). Put 

d 

An := [J t 1 V ln {h? ax /hf n )] , B n := [l V log 2 {V max /V min )] , 
i=i 

where from now on 

d d 

y . — TT /,min y ■— TT ft max 
' mm • — J| i ' "mas ■ — 11 » 

i=l i=l 

The next two statements, Theorem [1] and Theorem [21 provide oracle inequalities on the 
L<j-risk of / in the cases s £ [1,2] and s £ (2,oo) respectively. 
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Theorem 1. Let Assumptions (Kl) and (K2) hold. 
(i) If s G [1, 2) then for all f and n > 4 2s /( s ~^ 



K 8 [f;f] < inf (l + 3||/qi)7^[A,/]+Ci(nl4) 
hen 



+ C 2 A% q n 1 / s e W { 



l/s-l 
2n 2/s-l 

37q 



(9) 



(ii) If 8 = 2 and f^V max + 471" 1 / 2 < 1/8 then for all f G F 



< inf (l + 3||ir|| 1 )^ s [/ /l ,/] + C 3 W 



+ C 4 ^ /9 n 1 / 2 exp 



-1/2 
1 



(10) 



16g[f^y max + 4n-V2] 

Here C\ and C3 are absolute constants, while C% and C4 depend on Lk, koo, d and q only. 

Theorem 2. Let Assumptions (Kl) and (K2) hold, f G F, s G (2, 00), and assume that 
for some C\ = C\{K, s,d) > 1 



TlVmin > Cl, Vmax > l/\/n. 

If n > C2 for some constant C2 depending on Lk, k^, f^, d, and s only, then 
K s [f;f] < inf [(l + 3||iir|| 1 )W.[A,/] + C734/ 2 (n^)- 1 /2l 



hen 



+ C^B^n 1 ' 2 [exp{-C 5 6 n , s } + exp { - C 6 f" V"^}] , (11) 

where h n ^ s := n 4 / 5 " 1 if s £ (2,4), and 6 niS := [foo^max]" 1 if s > A. The constants Ci, 
% = 3, . . . , 6 depend on Lk, k^, d, a and s onZy. 

Remarks. 

1. All constants appearing in Theorems Q] and [2] can be expressed explicitly [see Lem- 
mas [T] and [2] below and corresponding results in Goldenshluger and Lepski (2010) for details]. 

2. We will show that for given h the expected value of the stochastic error of the 
estimator fh, i.e. (E||^||f ) 1//g , admits the upper bound of the order 0((nV/ l ) 1 / s_1 ) when 
s G [1,2), and 0{{nVh)~ 1 ^ 2 ) when s G (2, 00). It is also obvious, that 



K s [f h ;f] < \\B h \\ s + (E f U h 



where Bh{f,t) := f Kh(t — x)f(x)dx — f{t), t G M. d . Thus, our estimator attains, up to a 
constant and reminder term, the minimum of the sum of the bias and the upper bound on 
the stochastic error. This form of the oracle inequality is convenient for deriving minimax 
and minimax adaptive results [see Section[3]. Indeed, bounds on the bias and the stochastic 
error are usually developed separately and require completely different techniques. 

3. We note that A H < 0([lnn] d ) and B H < O(lnn) for any set H C [0, l] d such that 
^rmn > 0(n~ c ), c > 0, Vi = 1, . . . , d. If s G (2, oo), and if the set of considered bandwidths 
% is such that V^^ = [x lnn]~ s / 2 for some x > then the second term on the right hand 
side of (I10p and (lllh can be made negligibly small by choice of constant x. Observe that 



C 



conditions ensuring consistency of fh are nVh —> oo and Vh — >■ as n — > oo; thus the 
requirement V mayL = [xlnn] -5 / 2 is not restrictive. Note also that in the case s £ [1,2) the 
second term on the right hand side of Q is exponentially small in n for any T~L. 

4. The condition Vmax > l/\/n is imposed only for the sake of convenience in presen- 
tation of our results. Clearly, we would like to have the set T~L as large as possible; hence 
consideration of vectors h ma,x such that V m&K = V^max < 1/y/n has no much sense. 

5. It should be also mentioned that if for s E [1,2) we impose additional conditions 
on / [e.g., such as the domination condition in Donoho et al. (1996, p. 514)], then the 
order of stochastic error of fh can be improved to (^((nV/J -1 / 2 ). It is well-known that 
smoothness condition alone is not sufficient for consistent density estimation on M. d with 
Li-losses (Ibragimov and Khasminskii 1981). 

2.5 L s — risk oracle inequalities 

As it was mentioned above, the oracle inequalities of Theorems [1] and [2] are useful for 
derivation of adaptive rate optimal estimators. Moreover, they are established under very 
mild assumptions on the density /. However, traditionally oracle inequalities compare the 
risk of a proposed estimator to the risk of the best estimator in the given family, cf. ([2]). 
The natural question is whether an "L s -risk oracle inequality of the type ([2]) can be derived 
from the results of Theorems Q] and [2j In this section we provide an answer to this question. 
We will be mostly interested in finding minimal assumptions on the underlying density / 
that are sufficient for establishing the L s -risk oracle inequality. It will be shown that this 
problem is directly related to establishing a lower bound on the term (Ej||£/i||s) 1//l3 . 

Let \i £ (0, 1) and v > be fixed real numbers. Denote by the set of all probability 
densities p satisfying the following condition: 

3 B G B{M. d ) : mes(B) < u, p>fi. 

Jb 

Here B(M d ) is the Borel cr-algebra on IR d , and mes(-) is the Lebesgue measure on M. d . 

Below we will assume that / € F„ v for some fi and v. This condition is very weak. 
For example, if J 7 is a set of densitites such that either (i) T is a totally bounded subset 
of Li(]R c '); or (ii) the family of probability measures {P/,/ € J 7 } is tight, then for any 
fj, G (0, 1) there exists < v < oo such that T C F M)1/ . The statement (i) is a consequence 
of the Kolmogorov-Riesz compactness theorem. 

Theorem 3. Let s € [2, oo) and suppose that assumptions of Theorem\^ii) and Theorem^ 
are fulfilled. If s > 2 then assume additionally that f G F^ for some fi and v , and 

^max ^2 /I 

If n > C\ = C\{Lki koo, foo, d, s) then there exist a constant Cq > (Co = Cq{K) if s = 2, 
and Co = Co(K, fx, u, s) if s > 2) such that 

n s [f;f] < C Q min s [f k ;f] 

exp{-C 3 6 n , s } + exp | - C^V'^ , 

where := n 4 / 5 " 1 if s £ (2,4), and b n)S := [foo^max] -1 if s > 4. The constants Ci depend 
on Lk, koo, d, q and s only. 




+ C 2 A^BH q n 1 / 2 
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The proof indicates that Theorem [3] follows from the fact that for any s G [2, oo) one 

has 

[E/||ailf] 1/9 > cfaVh)- 1 ' 2 , Vh, (12) 

where c > is a constant. This lower bound holds under very weak conditions on the 
density / (for arbitrary / is s = 2 and / £ if s > 2). In order to prove the similar L s - 
risk oracle inequality in the case s E [1, 2) it would be sufficient to show that [F/H^Hf] 1 / 9 > 
c(nVh)~ 1+1 ^ s for any h. However, the last lower bound cannot hold in such generality as 
(|12p . In particular, according to remark 5 after Theorem[2l [Ej||^||s] 1//,? < c(nV/ l ) -1 / 2 for all 
h under a tail domination condition (e.g., for compactly supported densities). Under such a 
domination condition the corresponding L s -risk oracle inequality can be easily established 
using the same arguments as in the proof of Theorem El 

3 Adaptive estimation of densities with anisotropic smooth- 
ness 

In this section we illustrate the use of oracle inequalities of Theorems [T] and [2] for derivation 
of adaptive rate optimal density estimators. 

We start with the definition of the anisotropic Nikol'skii class of functions. 

Definition 1. Let a = (a\, . . . , ay), a.% > and L > 0. We say that density f : M. d — > R 
belongs to the anisotropic Nikol'ski class N s ^(a,L) of functions if 

(i) \\Dl aii f\\ s < L ! foralli = l,...,d; 

(ii) for all i = 1, . . . , d, and all zGl 1 

| j \D\ a ' i f(t 1 ,...,t i + z,...,t d )-D^ i f(t 1 ,...,t i ,...,t d )\ s dt' ] j ' <L\zr~^. 

Here D\f denotes the kth order partial derivative of f with respect to the variable ti, and 
[oti\ is the largest integer strictly less than aj. 

The functional classes N s ^(a, L) were considered in approximation theory by Nikol'skii; 
see, e.g., Nikol'skii (1969). Minimax estimation of densities from the class N s ^(a,L) was 
considered in Ibragimov and Khasminskii (1981). We refer also to Kerkyacharian, Lepski 
and Picard (2001) where the problem of adaptive estimation over a scale of classes N SjC [(a, L) 
was treated for the Gaussian white noise model. 

Consider the following family of kernel estyimators. Let u be an integrable, compactly 
supported function on R such that j u(y)dy = 1. As in Kerkyacharian, Lepski and Picard 
(2001), for some integer number I we put 

k=l v 7 

and define 

d 

K{t) :=JI«i(ti), t = (t 1 ,...,t d ). (13) 

i=l 
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The kernel K constructed in this way is bounded and compactly supported, and it is easily 
verified that 

J K(t)dt = 1, J K{t)t k dt = 0, V|fc| = 1, 1, 

where k = (k\, . . . , Ay) is the multi-index, ki > 0, \k\ = k\ + ■ • ■ + Ay, and t k = t kl • • • t k d d for 
t = (ii, . . . , td). 

For fixed a = (a±, . . . , ay) let & be defined by the relation l/a = J2i=i(^/ a i)- Define 
also 



s 6(1,2], 
1/2, sG(2,oo). 



Theorem 4. Let J'i'H) be the family of kernel estimators defined in fTp, (0j and ^ that 
is associated with the kernel Let f denote the estimator given by selection according 

to our rule from the family F{T~L). 

(i) Let s € (1, 2), and assume that hf im = l/n and hf 1 ^ = 1, Vi = 1, . . . , d. Then for 
any class N Si d(ce,L) such that maxt=i „ d\_ a i\ < I — 1, L > one has 

limsup{[¥> n , s (a)]- 1 W a [/;JV M (a,L)]} < C < oo. 

(ii) Lei s G [2, oo), and assume that hf 1111 = K\jn and hf 1 ^ = [x 2 lnn]~ s ^ 2 ^, Mi = 
l,...,d for some constants xi and x 2 . Then for any class N s d(a,L) such that 
maxj = i j ... )( i|_a'ij < / — 1, L > one has 

limsup{[^ n , 8 (a)]- 1 W a [/;JV ai(i (a,L)]} < C < oo. 

It is well-known that ip ns (a) is the minimax rate of convergence in estimation of densi- 
ties from the class i\T s y(a, L) [see Ibragimov and Khasminskii (1981) and Hasminskii and 
Ibragimov (1990)]. Therefore Theorem [5] shows that our estimator / is adaptive minimax 
over a scale of the classes N St d(ct,L). 

4 Proofs 

First we recall that accuracy of estimators fh and fh,r), h, rj G H is characterized by the 
bias and stochastic error given by 

B h (f,t) := J K h (t-x)f(x)dx- f(t), 

1 n 

Zh{t) ■= -J2[ K h(t-Xi)-E f K h (t-X)], 



n 
i=i 



and 



B h , v (f,t) := J[K h *K v ](t-x)f(x)dx-f(t), 
1 n 

Zh,r,(t) := -Y,{[K h *Kr,](t-Xi)-E f [K h *K v ](t-X)}. 



n 
i=l 



respectively. 

The proofs extensively use results from Goldenshluger and Lepski (2010); in what follows 
for the sake of brevity we refer to this paper as GL (2010). 
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4.1 Auxiliary results 



We start with two auxiliary lemmas that establish probability and moment bounds on 
L s -norms of the processes £/, and Ch,ri- Proofs of these results are given in Appendix. 

Lemma 1. Let Assumptions (Kl) and (K2) hold. 

(i) Ifse [1, 2) then for all n > 4 2s /( 2 ~ s ) one has 



{e, sup [ll^lU - 32^(^)1 9 } 1/q < $W : = C^n^expj 



hen 



2n 2 / s ~ 1 
37q 



}, (14) 



f E/ sup [||£ M || S - 32p s (i^ * K v )\ Q V /q < 

1 (h,r))eHxH L J + J 



2n 2/ S -l 

37o 



}• (15) 



(ii) Let / £ F, and assume that Sff^Vmax + 4n x / 2 ] < 1; i/ien /or a// / € F one /ias 



{E /S u P [iie ft |i2-f p 2 (^)l'} 1/9 < 

L hen 1 o 



,2 



Cs^n^exp 



16g[^ nMC ^ + 4n- 1 /2] 



}, (16) 



r r 25 

1 E/ sup Uh, v h - -irP2{Kh * ^) 



I} 1 " < « 



1 



:= C^V^expj- 

T/ie constants C{, i = 1, . . . , 4 depend on Lk, koo> d on/y. 

Lemma 2. Lei Assumptions (Kl) and (K2) hold, / 6 F, s > 2, and assume that 

n>Ci, nV m i n >C 2 , V ma , x >l/^n. 

Then the following statements hold: 



}• (17) 



{e, sup [||^|| s -32r s (^) 
1 hen 1 



q \ 1,q < tfW 



# w n< exp | - J. 

too ''max 



E/ sup ||^|| s -32f a (if/,*i^) 
(h,v)eHxn 1 



f T/ 2 / s J ' 
i oo "ma 



(18) 



(19) 
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In addition, for any Hi C H and H2 Q H 



E f sup [f s {K h )} (1 < (l + 8c s y swp[r 8 (K h )]* 

heHi_ heHx 

+ C 7 A 2 H B H n^-V/^ exp { - C 8 6 n „ s }, 
E f sup [r a (K h *K v )]* < (l + 8c s y sup [r s (K h * K r] )] q 

{h,r j )&H 1 xH 2 (h,r t )eH 1 xH 2 



(20) 



+ C 9 A%B H n 



g(s-2)/(2s) 



exp { - Ci 6n,s}, (21) 



where b n ^ s := n 4 / s 1 i/s 6 (2,4), and := [f tX) 'Vmax] 1 if s *E [4, 00). TTie constants Ci, 
i = 2, . . . , 10 depend on Lk, ^-00, d, q and s only, while C\ depends also on foo- 

4.2 Proof of Theorems [T] and [5] 

The proofs of both theorems (which we break in several steps) follow along the same lines. 
1°. First we show that for any h,r] 6 H 

B htV (f,x) = B v (f,x) + f K v (y-x)B h (f,y)dy (22) 

= B h (f, x) + f K h (y - x)B v (f, y)dy. (23) 

Indeed, by the Fubini theorem 

K h * K v ](t - x)f{t)dt = J [J K h (t- y)K v (y - x)dy] f(t)dt 

K h (t-y)f(t)dt-f(y)]K ri (y-x)dy + J K v (y - x)f{y)dy 

K v (y-x)f(y)dy + J K v (y - x)B h (f, y)dy. 

Subtracting f(x) from the both sides of the last equality we come to ([22]); (j23[) follows 
similarly. 

2°. Let m s (-, •) and m*(-) be given by ©, and define 



E / sup [Uh, v -^\ 



m 



.(M)]J.} 



(24) 



Let / = f- h be the estimator defined in ©-((HI)- Our first goal is to prove that 

Tl s [f;f} < mf \(l + 3\\K\\ 1 )n s [f h ;f} + 2(E f [rn* s (h)] g ) 1/q } + 36 n>s . 
By the triangle inequality for any n S T~L 

|| A - f\\s < \\f h - f h Js + ||4, - fjs + \\f v ~ f\\s, 
and we are going to bound the first two terms on the right hand side. 



(25) 



(26) 
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Define 



B h {f) := sup K v {t--)B h {f,t)dt 



h£U. 



We have for any h G % 
R h -m*(h) = 



= sup 
< sup 



\\fh,r, ~ f V \\s ~ m s (h,T]) 

\\B h , v {f,-) ~ B v (f,-)\\ s + Mh, v -£ v \\s - m s (h,r]) 



< B h (f) + sup Uh, v -Cvh-m s (h,ri) =: B h (f) + (. 

r/eH L J + 



Here the second line is by the triangle inequality and the third line is by (J22J) and definition 
of Therefore for any h G T~L one has 



R h <B h (f)+m*(h) + t. 



(27) 



By d22]), (|23]) for any h,r) 6 U 



\\fh,r) — fh\\s < ||-Bfe,r)(/) •) - #/i(/, Oils + 1 1 Cm ~ Ch\\s 

< B v (f) + C + sup m s (i],h) 

= B v (f) + m* s (h) + ( < B v (f) + R h + (, 

where the last inequality is by definition of R^. In particular, letting h = h we have that 
for any i] G % 



||/ M -4IU < B v (f) + R h + ( 

< B r ,(f) + R n + C < 2B v (f) + m*M + 2(, 

where we have used that R~ h < R^, Vr/ G H and ([27]) . 
Furthermore, for any r] G % 

II/m _ ^H s = HA,?, ~ All* - m s(h,v) +m s (h,rj) 
< R~ h < R v <B v (f) + m*M + (, 



(28) 



(29) 



where the first inequality is by definition of Rh, the second inequality is by the definition 
of h, and the last inequality follows from ([2"Tj) , 

Combining (J26]), ([28]) and ([29} we get for any r] £ 7i that 

114-/11, < 114 - 4,H s + Wkv-f^ s + Wfv-fh 

< \\f v -f\\ s + 3B v (f) + 2rn*M + 3(. 

Taking this expression to the power q, computing the expectation and using the fact that 
[E f | C| 9 ] 1/q = 5 n , s we obtain 



U s [f; f] < inf {n s [f h ; f] + W h (f) + 2(E f [m* s (h)} q ) 1/q \ + 35 n „ 



(30) 



By the Young inequality ||-B fc (/)||, < (sup„ 6W ||ir„||i) ||fl h (., /)|| s = /)|| s . In 

addition, see ([36]) . 

\\B h (;f)\\ s <K s [f h ;f], Wi€ft 
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Combining this with (|30p we complete the proof of (|25p . 

3°. Lemmas [1] and [2] lead to an upper bound on the quantity 5 n>s given in (|24p . Indeed, 
by definition of m s (-, •) [see ©] we have 



\E f sup [||& >l7 -f„||,-m«(7i, t;)]'} 

1 (h, v )eHxU J 



1/9 



< {E/ sup [UhJs-9,(Kh*K v )] 9 ,Y 9 + |%sup [Uh\\s-9s(K h )] q y /9 

< + (31) 
where expressions for <5^1 and depending on the value of s £ [1, oo) are given in (|14|) ~ 

(HSD, PD-flD, and (HHD-CSI). 

In order to apply (|25|) it remains to to bound {Ej[?n*(/i)]' 7 } 1 / g . 
4°. We start with the case s S [1,2). Here, by definiton, 

m* s (h) = sup m s (rj, h) = g s (K h ) + sup g s (K v * K h ) 

= mn^dl^ll. + supll^*^^) < 128[1 + ll-K-IUKnVfc) 1 /*- 1 . 

Therefore applying (|25p. and taking into account ([3"Tj) . (|14p and (fT5|) we come to the state- 
ment (i) of Theorem [TJ 

The statement (ii) of Theorem [T] dealing with the case s = 2 follows similarly by appli- 
cation of (|25p and (|3ip , (|16p and (|17p . This completes the proof of Theorem [TJ 

5°. Now consider the case s S (2, oo). Because 

m*(h) = sup m s (r], h) = g s (K h ) + sup g s (K v * K h ) 

= 32f s (K h )+ 32 sup f s (K v *K h ), (32) 

it suffices to bound from above [Ey|r s (A'/ 1 )| <? ] 1 / 9 and [Ej-sup^ e -^ \r s (Kh * K v )\ q ] l / q . Using 
(|20p of Lemma [2] with H\ = {h} we have 



\E f \r a (K h )\^ < Cl r s (K h ) + c 2 A^ q BK q n^- 2 y^eM-c 3 b n , s }. 
In addition, by the Young inequality 

p s {K h ) = Csn-^llKl* f^l + n 1 ^ 1 \\K h \\ s 

< c s n-y 2 \\K h \\ 2 \\^f\\ s + (nV h )- 1+1 ''\\K\\ a 

< c^H^lb^-^ + HA-IU^)- 1 ^ < c 4 fV 2 (ny,)^/2. 

hence 

\E f \r s (K h )\^ < c^inV^ 2 + ^A^B^n^'^ ^p{-c^ s }. (33) 
Now, applying (|2ip with H\ = {h} and H 2 = % we obtain 

[Ey sup |f s (X- fc * 1/9 < c 6 sup r s (A h * K v ) + cyA^B^n^^ e X p{-c 8 & n , s }. 
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In addition, similarly to the above 

sup p a (K h *K n ) < sup{c s n- 1 / 2 ||^*K, ? || 2 || v / 7|| s + n- 1+1 / s ||^*/^|| s ) 

Therefore the last two bounds yield 

[E f sup \f s {K h * K v )\«] 1/q < c 10 fU 2 (nV h r^ 2 + c^B^n^ 2 ^ exp{-c s b n , s }. 

This along with (|33p and (|32p results in 

[E f \m*(K h )\ q ] 1 / q < c n 4 /2 W V2 + c l2 A^ q B^n^'^ exp{- Cl3 6 n>s }. 



Combining this bound with (|18p . (j 19j) and (|3ip . and applying (|25p we complete the proof 
of Theorem [2j | 



4.3 Proof of Theorem H 

Throughout the proof we denote by cq,c±, ... , the positive constants depending only on the 
kernel K, the index s and the quantity f^. We divide the proof in several steps. 

1°. Let us prove that for any q > 1 and h £ H 

3K s [f h ;f] > \\B h (f)\\ s +E f U h \\ s . (34) 
Indeed, in view of the Jensen inequality for any q > 1 

n s [f h ;f]>E f \\f h -f\\ a = E f \\B h (f)+Z h \\ a . (35) 
Denote by B p (l), 1 < p < oo, the unit ball in L p (R rf ). By the duality argument 

®f\\B h (f) + S h \\ 8 = E f sup [ e(t)[B h (f,t) + £ h (t)]dt, r = ^—. 

eeM r {i)J s-l 

Let £ G B r (l) be such that \\B h (f)\\ s = f £ (t)B h (f,t)dt; then 

E f \\B h (f) + Z h \\ a >Ef f £ (t)[B h (f,t)+£ h (t)]dt= \\B h (f)\\ s . (36) 

Here we have used that Ej£h(t) = 0, Vi G K d . We also have by the triangle inequality 

%Pfc(/) + alls > E/H^ll, - ||B h (/)|| a . (37) 
Summing up the inequalities in (|36p and (I37p we get 

E f \\B h (f) + £ h \\ s > 2- 1 E / ||^|| a . (38) 
Thus, in view of ([36]) and ([38|) for any a £ (0, 1) 

%ll^(/) + alls > (1 - a)\\B h (f)\\ s + 2" 1 aE / ||a|| s . (39) 
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Choosing a = 2/3 we arrive to (|34|) in view of (|35p . 

In view (|34p . the assertion of the theorem will follow from the statement of Theorem [2] 
if we will show that 

®fUh\\s > c G (nv h y l i 2 . 

2°. Let b > be a constant to be specified, and put a = b~ 1 y/nVi l . By duality 



E f U h \\ s = M f sup I e(t)th(t)dt, 



s-l 



(40) 



Define the random event A = {a^h G ^(l)} , and note that if A occurs then by the Holder 
inequality 



a 5 &eB r (l), V ff GB^(l). 

2-r 

Remind that s > 2 implies r 6 [1, 2], and if r = s = 2 the we formally put 



(41) 



2r 
2-r 



OO. 



If the event A occurs then B r (l) D {a^h : 5 G B_2r_(l)}. Therefore, by (g0]) and flU 



> aE 



sup / 0(t)$(t)dt 

961 2r (1) ■/ 



> a sup Ej 

gel 2r (l) 



HA) / s(^(i)dt 



a sup / g(t) [E f l(A)f h (t)] dt = a E^(.) 1(A) 

g€M 2r (1) J 



2s 
8 + 2 



> a 



2s 

8 + 2 



where A is the event complementary to A. 

Now consider separately two cases: s = 2 and s > 2. 

3°. If s = 2 we get from (l42]l 



%IM|2 > a 



Efti(t)dt-EfU\Z h \\iI[\\Z h 



> 



VnVh 



Note that 

= / " x)/(z)da; - n" 1 ^ # h (t - x)/(x)d 

and, therefore, 



E f d(t)dt = J!^E - n" 1 



The application of Young's inequality yields 

1 2 



- x)/(x)dx 



dt. 



K h (t-x)f(x)dx 



dt< \\K h \\l 



< \\K\\H 



1 -"-oo- 



(42) 



(43) 



(44) 



(45) 



Here we have used that / G F. Thus, we obtain, in view of Vh < V m &x < 1/8 [see assumption 
of the part (ii) of Theorem [1] 



E f e h (t)dt > 



mli ™ > Cl {nV h )-\ 



nVy 



h 



n 



(46) 
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It follows from Theorem 1 of GL (2010) that for any x > 2 

x\\K\W 



ip iiai| 2 > 



VnVh 



< c 2 (l-x) 



(47) 



and, therefore, putting b = y\\K\\2, y > 2, we obtain 

% { ll^lll I (JI6JI2 > } ^ nKWUnVny 1 ^ xe^'^dx. (48) 

Choosing y sufficiently large in order to make latter integral less than ci / (4| | 1 1 2 ) we obtain 
from ggj) , (gg) and (j38|) 

E/H&lla > c 3 (nT4)- 1/2 . 
The theorem is proved in the case s = 2. 

4°. Return now to the case s > 2. Note first that 



> 



s+2 

2s 2-s 
> V 2s 



/ %^(*)dt. 



(49) 



The last relation is obtained by Holder inequality. Taking into account that J B f(t)dt > fi, 
we get, using (|4"4"]l and (|4"5j) . 



E f e h (t)dt > 



mII^III 



1*11? £ 



1 J-oo 



> Cifi(nV h ) x . 



nVh n 

Here we have used that Vh < 2 _1 /i||K|||/||i^||f . On the other hand 



(50) 



E,$(0 l(-4) < {e, [&(0]^} " 2 {p(^)} 



s-2 
2s 



and, therefore, 



2s 
s+2 



4s 
s+2 



We derive from Theorem 1 in GL (2010) that there exists C5 such that 

4s 

E/( ||ail^) S+2 <C5(nV h )-&. 

\ s+2 J 

Putting b = x\\K\\2, x > 2, we have in view of (|47|) 

f, T .~| c 2 (l- 3 ;)(s-2) 

|P(.A)J < e 2i . 

It leads together with (|5ip and (I52p to the following estimate. 



(51) 



(52) 



c 2 (\-x)(s-2) 



2s 
s+2 



(53) 



We obtain finally from (1421). (149j) . (1501) and (1531) 

— 1 / \— 1/2 / 2-s c 2 (l — x)(s — 2) 

> N^-lb) \nV h ) (c A fiu^-c e e 21 

It remains to choose x sufficiently large and we come to the assertion of the theorem in the 
case s > 2. ■ 
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4.4 Proof of Theorem [4] 



Let / S N s d(a, L). It easily checked [see, e.g., Proposition 3 in Kerkyacharian, Lepski and 
Picard (2001)] that bias of the estimator fh is bounded as follows 

d 

\\B h (f,-)\\ s < 0^,(1,3)^^'. 

i=i 

Moreover, {E f \\£ h \\ q s } 1/q < C 2 {nV h )^ s . If we set the "oracle bandwidth" h* := {h\, ...,h* d ) 
so that 

[h*r ■= 

then h* £ % and G J~(H) for large enough n. Hence, for any / £ N Sj d(a, L) we have that 
^-s[//i*;/] < C399 niS (a). Then we apply oracle inequalities of Theorems Q] and [2j Observe 
that by choice of constant >C2 in definition of /i max we guarantee that the remainder terms 
are negligibly small in terms of dependence on n as compared with the first terms in (|10p 
and (jlip . This fact leads to the statement of the theorem. | 



9i 

Ci 



L -a/( 7s +a) n - 7s a/( 7s +a) ) ^ = ^ ^ 



5 Appendix 

Proofs of Lemmas [T] and [2] follow directly from general uniform bounds on norms of empirical 
processes established in GL (2010). In our proofs below we use notation and terminology 
of the aforecited paper. 

Proof of Lemma [TJ The statement is a direct consequence of Theorem 4 of Section 3.3 
in GL (2010). 

To apply this theorem one should verify Assumptions (Wl), (W4), and (L) for the 
following classes of weights = {w = n~ 1 Kf l : h G %} and = {w = n~ l {Kf l *K ri ) : 
(h,rj) eHxH}. The sets and W (2) are considered as images of H and T~L x T~L under 
transformations h i— >■ n~ 1 Ki % and (h, rf) i— > n _1 (K^* K^) respectively. The sets % and HxH 
are equipped with the distances 

d 1 (h,h') = c 1 max ln(JiXM), d 2 [(/i, ^)» (W)] == <*{di(fc, fc') V difa,*/)}, 

where c\ and c 2 are appropriate constants depending on koo, Lk ctiid d only [s66 formulae 
(9.1)-(9.2) in GL (2010)]. With this notation Lemma 9 of GL (2010) shows that Assump- 
tion (L) holds for both and . Moreover, Assumption (Wl) holds trivially both for 
Vl^ 1 ) and with /z* = V mSuX and /i* = 2 d V max repsectively. Moreover, Assumption (W4) 
for both and follows from formula (9.8) in GL (2010). Thus all conditions of 

Theorem 4 are fulfilled. 

(i). We apply this theorem with z = 1 and e = 1. We need to evaluate the constant 
T3 i£ for and . If N-}i^ 1 (e) denotes the minimal number of balls in the metric di 
needed to cover 7i, then formula (9.8) from GL (2010) shows that A^di(l/8) < c^A-^, 
where C3 depends on d only. Similarly, N^ x y_ : d 2 (1/8) < c^A^. In addition, for 

00 

£*,dx(<0 := ^exp{21niV Hidl (e2- fc ) - (9/16)2 fe fc" 2 } 
k=i 
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we have L^ i d 1 (l) < c^A-^. Similarly, L%xW,d 2 (l) — cqA^. Combining these bounds we 
come to the statement (i). 

(ii). The second statement follows exactly in the same way from the above considera- 
tions. Theorem 4 of GL (2010) is again applied with z = 1 and e = 1. | 



Proof of Lemma O The proof is by application of Theorem 7 from GL (2010). We need 
to calculate several quantities. 

We start with the class . Here for = Wc s i oc (L K Vd) d / 2 we have 

CUV) = l + 2^{^(V±+n-^)+yn-^} 

where we have used that V max > 1/y/n. If we set y = y := [4T^4^ (^q 1 -* VI)] -1 then C^ x {y) < 
4. We apply Theorem 7 with e = 1 and y = y. Condition nV min > C x = [256c2]( sA4 )/( sA4 " 2 ) 
implies that 

fi!( 7 ) = 4[1 - 8c s (nVi lin ) 1 /(«A4)-i/2 ] -i < 8 _ 

Moreover, we note that condition y < yi 1 ^ follows from definition of y and n > C%. In 
addition, ffj < cA 2 n B H . These facts imply (fTHD and (1201). 

The bounds (|19|) and ()21[) for W^ 2 ^ follow from similar computations. | 
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